Bandit Algorithms for Tree Search

نویسندگان

Pierre-Arnaud Coquelin

Rémi Munos

چکیده

Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to return rapidly a good value, and improve precision if more time is provided. The UCT algorithm [8], a tree search method based on Upper Confidence Bounds (UCB) [2], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is “over-optimistic” in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT using a confidence sequence that scales exponentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST) which takes into account actual smoothness of the rewards for performing efficient “cuts” of sub-optimal branches with high confidence. Finally, we present an incremental tree expansion which applies when the full tree is too big (possibly infinite) to be entirely represented and show that with high probability, only the optimal branches are indefinitely developed. We illustrate these methods on a global optimization problem of a continuous function, given noisy values.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bandit Algorithms for Tree Search Pierre - Arnaud Coquelin —

Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [GWMT06]. The UCT algorithm [KS06], a tree search method based on Upper Confidence Bounds (UCB) [ACBF02], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is too “optimistic” in some cases, leading to a regret Ω(exp(exp(D))) where...

متن کامل

Asymmetric Move Selection Strategies in Monte-Carlo Tree Search: Minimizing the Simple Regret at Max Nodes

The combination of multi-armed bandit (MAB) algorithms with Monte-Carlo tree search (MCTS) has made a significant impact in various research fields. The UCT algorithm, which combines the UCB bandit algorithm with MCTS, is a good example of the success of this combination. The recent breakthrough made by AlphaGo, which incorporates convolutional neural networks with bandit algorithms in MCTS, al...

متن کامل

The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games

Game tree search in games with large branching factors is a notoriously hard problem. In this paper, we address this problem with a new sampling strategy for Monte Carlo Tree Search (MCTS) algorithms, called Naı̈ve Sampling, based on a variant of the Multi-armed Bandit problem called the Combinatorial Multi-armed Bandit (CMAB) problem. We present a new MCTS algorithm based on Naı̈ve Sampling call...

متن کامل

Bandit Algorithms in Game Tree Search: Application to Computer Renju∗

Multi-armed bandit problem is to maximize a cumulated reward by playing arms sequentially without prior knowledge. Algorithms for this problem such as UCT have been successfully extended to computer GO programs and proved significantly effective by defeating professional players. The goal of the project is to implement a Renju AI based on Monte Carlo planning that is able to defeat the oldest k...

متن کامل

On MABs and Separation of Concerns in Monte-Carlo Planning for MDPs

Linking online planning for MDPs with their special case of stochastic multi-armed bandit problems, we analyze three state-of-the-art Monte-Carlo tree search algorithms: UCT, BRUE, and MaxUCT. Using the outcome, we (i) introduce two new MCTS algorithms, MaxBRUE, which combines uniform sampling with Bellman backups, and MpaUCT, which combines UCB1 with a novel backup procedure, (ii) analyze them...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Bandit Algorithms for Tree Search

نویسندگان

چکیده

منابع مشابه

Bandit Algorithms for Tree Search Pierre - Arnaud Coquelin —

Asymmetric Move Selection Strategies in Monte-Carlo Tree Search: Minimizing the Simple Regret at Max Nodes

The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games

Bandit Algorithms in Game Tree Search: Application to Computer Renju∗

On MABs and Separation of Concerns in Monte-Carlo Planning for MDPs

عنوان ژورنال:

اشتراک گذاری